{
  "cells": [
    {
      "cell_type": "markdown",
      "id": "4facdf7f-680e-4d28-908b-2b8408e2a741",
      "metadata": {},
      "source": [
        "# How to call tools with multimodal data\n",
        "\n",
        ":::info Prerequisites\n",
        "\n",
        "This guide assumes familiarity with the following concepts:\n",
        "\n",
        "- [Chat models](/docs/concepts/chat_models)\n",
        "- [LangChain Tools](/docs/concepts/tools)\n",
        "\n",
        ":::\n",
        "\n",
        "Here we demonstrate how to call tools with multimodal data, such as images.\n",
        "\n",
        "Some multimodal models, such as those that can reason over images or audio, support [tool calling](/docs/concepts/#tool-calling) features as well.\n",
        "\n",
        "To call tools using such models, simply bind tools to them in the [usual way](/docs/how_to/tool_calling), and invoke the model using content blocks of the desired type (e.g., containing image data).\n",
        "\n",
        "Below, we demonstrate examples using [OpenAI](/docs/integrations/platforms/openai) and [Anthropic](/docs/integrations/platforms/anthropic). We will use the same image and tool in all cases. Let's first select an image, and build a placeholder tool that expects as input the string \"sunny\", \"cloudy\", or \"rainy\". We will ask the models to describe the weather in the image.\n",
        "\n",
        ":::note\n",
        "The `tool` function is available in `@langchain/core` version 0.2.7 and above.\n",
        "\n",
        "If you are on an older version of core, you should use instantiate and use [`DynamicStructuredTool`](https://api.js.langchain.com/classes/langchain_core.tools.DynamicStructuredTool.html) instead.\n",
        ":::"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "id": "0d9fd81a-b7f0-445a-8e3d-cfc2d31fdd59",
      "metadata": {},
      "outputs": [],
      "source": [
        "import { tool } from \"@langchain/core/tools\";\n",
        "import { z } from \"zod\";\n",
        "\n",
        "const imageUrl = \"https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg\";\n",
        "\n",
        "const weatherTool = tool(async ({ weather }) => {\n",
        "  console.log(weather);\n",
        "  return weather;\n",
        "}, {\n",
        "  name: \"multiply\",\n",
        "  description: \"Describe the weather\",\n",
        "  schema: z.object({\n",
        "    weather: z.enum([\"sunny\", \"cloudy\", \"rainy\"])\n",
        "  }),\n",
        "});"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "8656018e-c56d-47d2-b2be-71e87827f90a",
      "metadata": {},
      "source": [
        "## OpenAI\n",
        "\n",
        "For OpenAI, we can feed the image URL directly in a content block of type \"image_url\":"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "id": "a8819cf3-5ddc-44f0-889a-19ca7b7fe77e",
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "[\n",
            "  {\n",
            "    name: \"multiply\",\n",
            "    args: { weather: \"sunny\" },\n",
            "    id: \"call_ZaBYUggmrTSuDjcuZpMVKpMR\"\n",
            "  }\n",
            "]\n"
          ]
        }
      ],
      "source": [
        "import { HumanMessage } from \"@langchain/core/messages\";\n",
        "import { ChatOpenAI } from \"@langchain/openai\";\n",
        "\n",
        "const model = new ChatOpenAI({\n",
        "  model: \"gpt-4o\",\n",
        "}).bindTools([weatherTool]);\n",
        "\n",
        "const message = new HumanMessage({\n",
        "  content: [\n",
        "    {\n",
        "      type: \"text\",\n",
        "      text: \"describe the weather in this image\"\n",
        "    },\n",
        "    {\n",
        "      type: \"image_url\",\n",
        "      image_url: {\n",
        "        url: imageUrl\n",
        "      }\n",
        "    }\n",
        "  ],\n",
        "});\n",
        "\n",
        "const response = await model.invoke([message]);\n",
        "\n",
        "console.log(response.tool_calls);"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "e5738224-1109-4bf8-8976-ff1570dd1d46",
      "metadata": {},
      "source": [
        "Note that we recover tool calls with parsed arguments in LangChain's [standard format](/docs/how_to/tool_calling) in the model response."
      ]
    },
    {
      "cell_type": "markdown",
      "id": "0cee63ff-e09f-4dd8-8323-912edbde94f6",
      "metadata": {},
      "source": [
        "## Anthropic\n",
        "\n",
        "For Anthropic, we can format a base64-encoded image into a content block of type \"image\", as below:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "id": "d90c4590-71c8-42b1-99ff-03a9eca8082e",
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "[\n",
            "  {\n",
            "    name: \"multiply\",\n",
            "    args: { weather: \"sunny\" },\n",
            "    id: \"toolu_01HLY1KmXZkKMn7Ar4ZtFuAM\"\n",
            "  }\n",
            "]\n"
          ]
        }
      ],
      "source": [
        "import * as fs from \"node:fs/promises\";\n",
        "\n",
        "import { ChatAnthropic } from \"@langchain/anthropic\";\n",
        "import { HumanMessage } from \"@langchain/core/messages\";\n",
        "\n",
        "const imageData = await fs.readFile(\"../../data/sunny_day.jpeg\");\n",
        "\n",
        "const model = new ChatAnthropic({\n",
        "  model: \"claude-3-sonnet-20240229\",\n",
        "}).bindTools([weatherTool]);\n",
        "\n",
        "const message = new HumanMessage({\n",
        "  content: [\n",
        "    {\n",
        "      type: \"text\",\n",
        "      text: \"describe the weather in this image\",\n",
        "    },\n",
        "    {\n",
        "      type: \"image_url\",\n",
        "      image_url: {\n",
        "        url: `data:image/jpeg;base64,${imageData.toString(\"base64\")}`,\n",
        "      },\n",
        "    },\n",
        "  ],\n",
        "});\n",
        "\n",
        "const response = await model.invoke([message]);\n",
        "\n",
        "console.log(response.tool_calls);"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "a66b7d2f",
      "metadata": {},
      "source": [
        "## Google Generative AI\n",
        "\n",
        "For Google GenAI, we can format a base64-encoded image into a content block of type \"image\", as below:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "id": "f8184909",
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "[ { name: 'multiply', args: { weather: 'sunny' } } ]\n"
          ]
        }
      ],
      "source": [
        "import { ChatGoogleGenerativeAI } from \"@langchain/google-genai\";\n",
        "import axios from \"axios\";\n",
        "import { ChatPromptTemplate, MessagesPlaceholder } from \"@langchain/core/prompts\";\n",
        "import { HumanMessage } from \"@langchain/core/messages\";\n",
        "\n",
        "const axiosRes = await axios.get(imageUrl, { responseType: \"arraybuffer\" });\n",
        "const base64 = btoa(\n",
        "  new Uint8Array(axiosRes.data).reduce(\n",
        "    (data, byte) => data + String.fromCharCode(byte),\n",
        "    ''\n",
        "  )\n",
        ");\n",
        "\n",
        "const model = new ChatGoogleGenerativeAI({ model: \"gemini-1.5-pro-latest\" }).bindTools([weatherTool]);\n",
        "\n",
        "const prompt = ChatPromptTemplate.fromMessages([\n",
        "  [\"system\", \"describe the weather in this image\"],\n",
        "  new MessagesPlaceholder(\"message\")\n",
        "]);\n",
        "\n",
        "const response = await prompt.pipe(model).invoke({\n",
        "  message: new HumanMessage({\n",
        "    content: [{\n",
        "      type: \"media\",\n",
        "      mimeType: \"image/jpeg\",\n",
        "      data: base64,\n",
        "    }]\n",
        "  })\n",
        "});\n",
        "console.log(response.tool_calls);"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "c5dd4ef4",
      "metadata": {},
      "source": [
        "### Audio input\n",
        "\n",
        "Google's Gemini also supports audio inputs. In this next example we'll see how we can pass an audio file to the model, and get back a summary in structured format."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 6,
      "id": "c04c883e",
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "[\n",
            "  {\n",
            "    name: 'summary_tool',\n",
            "    args: { summary: 'The video shows a person clapping their hands.' }\n",
            "  }\n",
            "]\n"
          ]
        }
      ],
      "source": [
        "import { SystemMessage } from \"@langchain/core/messages\";\n",
        "import { tool } from \"@langchain/core/tools\";\n",
        "\n",
        "const summaryTool = tool((input) => {\n",
        "  return input.summary;\n",
        "}, {\n",
        "  name: \"summary_tool\",\n",
        "  description: \"Log the summary of the content\",\n",
        "  schema: z.object({\n",
        "    summary: z.string().describe(\"The summary of the content to log\")\n",
        "  }),\n",
        "});\n",
        "\n",
        "const audioUrl = \"https://www.pacdv.com/sounds/people_sound_effects/applause-1.wav\";\n",
        "\n",
        "const axiosRes = await axios.get(audioUrl, { responseType: \"arraybuffer\" });\n",
        "const base64 = btoa(\n",
        "  new Uint8Array(axiosRes.data).reduce(\n",
        "    (data, byte) => data + String.fromCharCode(byte),\n",
        "    ''\n",
        "  )\n",
        ");\n",
        "\n",
        "const model = new ChatGoogleGenerativeAI({ model: \"gemini-1.5-pro-latest\" }).bindTools([summaryTool]);\n",
        "\n",
        "const response = await model.invoke([\n",
        "  new SystemMessage(\"Summarize this content. always use the summary_tool in your response\"),\n",
        "  new HumanMessage({\n",
        "  content: [{\n",
        "    type: \"media\",\n",
        "    mimeType: \"audio/wav\",\n",
        "    data: base64,\n",
        "  }]\n",
        "})]);\n",
        "\n",
        "console.log(response.tool_calls);"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "TypeScript",
      "language": "typescript",
      "name": "tslab"
    },
    "language_info": {
      "codemirror_mode": {
        "mode": "typescript",
        "name": "javascript",
        "typescript": true
      },
      "file_extension": ".ts",
      "mimetype": "text/typescript",
      "name": "typescript",
      "version": "3.7.2"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 5
}
